153 research outputs found
Robust Income Distribution Estimation with Missing Data
With income distributions it is common to encounter the problem of missing data. When a parametric model is fitted to the data, the problem can be overcome by specifying the marginal distribution of the observed data. With classical methods of estimation such as the maximum likelihood (ML) an estimator of the parameters can be obtained in a straightforward manner. Unfortunately, it is well known that ML estimators are not robust estimators in the presence of contaminated data. In this paper, we propose a robust alternative to the ML estimator with truncated data, namely one based on M-estimators that we call the EMM estimator. We present an extensive simulation study where the EMM estimator based on optimal B-robust estimators (OBRE) is compared to a more conservative approach based on marginal density (MD) for truncated data, and show that the difference lies in the way the weights associated to each observation are computed. Finally, we also compare the EMM estimator based on the OBRE with the classical ML estimator when the data are contaminated, and show that contrary to the former, the latter can be seriously biased.M-estimators, influence function, EM algorithm, truncated data.
Robust inference with binary data
In this paper robustness properties of the maximum likelihood estimator (MLE) and several robust estimators for the logistic regression model when the responses are binary are analysed. It is found that the MLE and the classical Rao's score test can be misleading in the presence of model misspecification which in the context of logistic regression means either misclassification's errors in the responses, or extreme data points in the design space. A general framework for robust estimation and testing is presented and a robust estimator as well as a robust testing procedure are presented. It is shown that they are less influenced by model misspecifications than their classical counterparts. They are finally applied to the analysis of binary data from a study on breastfeedin
Bounded-Bias Robust Estimation in Generalized Linear Latent Variable Models
This paper proposes a robust estimator for a general class of linear latent variable models (GLLVM) (Moustaki and Knott 2000, Bartholomew and Knott 1999). It is based on a weighted score function that is simple to implement numerically and is made consistent using the basic idea of indirect inference. The need of a robust estimator for these models is motivated by the study of the effect of model deviations such as data contamination on the maximum likelihood estimator (MLE). This is done with the use of the influence function (Hampel 1968, 1974) and the gross error sensitivity (Hampel, Ronchetti, Rousseeuw, and Stahel 1986). Simulation studies show that the MLE can be seriously biased by model deviations. The performance of the robust estimator in terms of bias and variance is compared to the MLE estimator with simulation studies and with a real example from a consumption survey.latent variable models, mixed items, influence function, robust estimation, indirect inference
A Latent Variable Approach for the Construction of Continuous Health Indicators
In most health survey the state of health of individuals is measured through several different kinds of variables such as qualitative, discrete quantitative or dichotomic ones. From these variables, one aims at building univariate indices of health that summarize the information. To do so, we propose in this paper to use Generalized Linear Latent Variable Models (GLLVM) (see e.g. Bartholomew and Knott 1999), which allows to estimate one or more continuous latent variables from a set of observable ones. As an application, we consider the data from the 1997 Swiss Health Survey and build two health indicators. The first one describes the health status induced merely by the age of the subject, and the second one complements the first one.
Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data
Extreme value data with a high clump-at-zero occur in many domains. Moreover,
it might happen that the observed data are either truncated below a given
threshold and/or might not be reliable enough below that threshold because of
the recording devices. These situations occur, in particular, with radio
audience data measured using personal meters that record environmental noise
every minute, that is then matched to one of the several radio programs. There
are therefore genuine zeros for respondents not listening to the radio, but
also zeros corresponding to real listeners for whom the match between the
recorded noise and the radio program could not be achieved. Since radio
audiences are important for radio broadcasters in order, for example, to
determine advertisement price policies, possibly according to the type of
audience at different time points, it is essential to be able to explain not
only the probability of listening to a radio but also the average time spent
listening to the radio by means of the characteristics of the listeners. In
this paper we propose a generalized linear model for zero-inflated truncated
Pareto distribution (ZITPo) that we use to fit audience radio data. Because it
is based on the generalized Pareto distribution, the ZITPo model has nice
properties such as model invariance to the choice of the threshold and from
which a natural residual measure can be derived to assess the model fit to the
data. From a general formulation of the most popular models for zero-inflated
data, we derive our model by considering successively the truncated case, the
generalized Pareto distribution and then the inclusion of covariates to explain
the nonzero proportion of listeners and their average listening time. By means
of simulations, we study the performance of the maximum likelihood estimator
(and derived inference) and use the model to fully analyze the audience data of
a radio station in a certain area of Switzerland.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS358 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Robust estimation of personal income distribution models
Statistical problems in modelling personal income distributions include estimation procedures, testing and model choice. Typically, the parameters of a given model are estimated by classical procedures such as maximum likelihood and least squares estimators. Unfortunately, the classical methods are very sensitive to model derivations such as gross errors in the data, grouping effects or model misspecifications. These deviations can ruin the values of the estimators and inequality measures and can produce false information about the distribution of the personal income in a given country. In this paper we discuss the use of robust techniques for the estimation of income distributions. These methods behave as the classical procedures at the model but are less influenced by model deviations and can be applied to general estimation problems.Personal income distribution, inequality measures, parametric models, influence function, M-estimator.
Distributional Dominance with Dirty Data
Distributional dominance criteria are commonly applied to draw welfare inferences about comparisons, but conclusions drawn from empirical implementations of dominance criteria may be influenced by data contamination. We examine a non-parametric approach to refining Lorenz-type comparisons and apply the technique to two important examples from the LIS data-base.Distributional dominance, Lorenz curve, robustness.
- …